Skip to content

Fix Python/FastAPI/SQL parsing: route false positives, Depends() tracking, SQL size guard, DLL calls#66

Closed
kingchenc wants to merge 2 commits intoDeusData:mainfrom
kingchenc:main
Closed

Fix Python/FastAPI/SQL parsing: route false positives, Depends() tracking, SQL size guard, DLL calls#66
kingchenc wants to merge 2 commits intoDeusData:mainfrom
kingchenc:main

Conversation

@kingchenc
Copy link
Contributor

Summary

Details

#28 — Python dict .get() misidentified as Route nodes

Source-based route extractors (extractGoRoutes, extractExpressRoutes, extractLaravelRoutes, extractKtorRoutes) were running on all function nodes regardless of file type. The Ktor regex \b(get|post|...)\("..." matched payload.get("sub") in Python files, creating ~125 false Route nodes.

Fix: File extension guard via switch filepath.Ext() — each extractor only runs on its own language (.go, .js/.ts, .php, .kt).

#27 — FastAPI Depends() not tracked

Functions passed to Depends() as parameter defaults (e.g. user = Depends(get_current_user)) were not extracted as calls — making critical auth/DI functions appear as dead code with in_degree=0.

Fix: New extractPythonDependsEdges() scans Python function signatures for Depends(func_ref) patterns and emits CALLS edges (resolution_strategy: "fastapi_depends"). Includes fallback for import aliases (from X import Y as Z) by extracting the original function name from the import path.

Tested: 392 Depends edges across 39 router files on a real FastAPI project. require_admin went from in_degree: 64in_degree: 180.

#62 — Stack overflow in tree-sitter SQL parser

Large .sql files (bulk INSERT dumps ~4.5MB) cause deep recursion in the tree-sitter SQL grammar, exhausting the C stack (especially on Windows with 1MB default).

Fix: Per-language file size guard in cbmParseFile(): SQL >1MB skipped, any file >4MB skipped. Logged as cbm.skip.large_sql / cbm.skip.large_file.

#29 — Dynamic DLL calling not tracked

C/C++ code using GetProcAddress(handle, "Func"), dlsym(handle, "func"), or .Resolve("Func") for dynamic DLL loading had no call graph edges to the resolved functions.

Fix: New extractDLLResolveEdges() detects these patterns via regex, creates CALLS edges to synthetic stub nodes with dll_name/dll_function metadata. Stubs are created during the sequential flush phase (same path as LSP stub nodes).

Test plan

Changed files

  • internal/httplink/httplink.go — file extension guard in discoverRoutes()
  • internal/pipeline/pipeline_cbm.go — SQL size guard, extractPythonDependsEdges(), extractDLLResolveEdges()
  • internal/pipeline/pipeline.go — extend createLSPStubNodes() to handle dll_resolve strategy

…king, SQL size guard, DLL calls

  - Fix DeusData#28: Restrict source-based route extractors (Go/Express/Laravel/Ktor)
    to their own file extensions. Prevents Python dict .get() from matching
    Ktor route regex and creating ~125 spurious Route nodes.

  - Fix DeusData#27: Track FastAPI Depends(func_ref) in parameter defaults as CALLS
    edges. Scans Python function signatures for Depends() patterns so
    dependency-injected functions (e.g. get_current_user) no longer appear
    as dead code with in_degree=0.

  - Fix DeusData#62: Add file size guard in cbmParseFile() to prevent tree-sitter
    SQL parser stack overflow on large .sql files (bulk INSERTs). SQL files
    >1MB and any file >4MB are skipped with a logged warning.

  - Fix DeusData#29: Detect dynamic DLL resolution patterns (GetProcAddress, dlsym,
    Resolve) in C/C++ source and create CALLS edges to synthetic stub nodes
    with dll_name/dll_function metadata.
…king, SQL size guard, DLL calls

  - Fix DeusData#28: Restrict source-based route extractors (Go/Express/Laravel/Ktor)
    to their own file extensions. Prevents Python dict .get() from matching
    Ktor route regex and creating ~125 spurious Route nodes.

  - Fix DeusData#27: Track FastAPI Depends(func_ref) in parameter defaults as CALLS
    edges. Scans Python function signatures for Depends() patterns so
    dependency-injected functions no longer appear as dead code (in_degree=0).
    Includes fallback for import aliases (e.g. `import X as _Y`).

  - Fix DeusData#62: Add file size guard in cbmParseFile() to prevent tree-sitter
    SQL parser stack overflow on large .sql files (bulk INSERTs). SQL files
    >1MB and any file >4MB are skipped with a logged warning.

  - Fix DeusData#29: Detect dynamic DLL resolution patterns (GetProcAddress, dlsym,
    Resolve) in C/C++ source and create CALLS edges to synthetic stub nodes
    with dll_name/dll_function metadata.
DeusData added a commit that referenced this pull request Mar 20, 2026
- FastAPI Depends(func_ref): scans Python function signatures for
  Depends() patterns and creates CALLS edges to the dependency function.
  Without this, auth/DI functions appear as dead code (in_degree=0).
- DLL resolve: scans C/C++ source for GetProcAddress/dlsym/Resolve
  patterns and creates CALLS edges to synthetic stub nodes, enabling
  call graph tracking across DLL boundaries.
- Extension scoping (#28) was already ported in 6a2b1f5.
- SQL size guard (#62) not needed: our C workers use 8MB stacks.

Tests: 3 new (httplink_laravel_path_filter, pipeline_fastapi_depends_edges,
pipeline_dll_resolve_edges). Total: 2041.

Co-Authored-By: kingchenc <kingchenc@users.noreply.github.com>
Co-Authored-By: mariomeyer <mariomeyer@users.noreply.github.com>
@DeusData
Copy link
Owner

Thanks for the thorough work across four issues — the extension scoping, FastAPI Depends tracking, DLL resolve edges, and SQL analysis are all well-engineered.

We've ported all applicable fixes to the C codebase:

All three ported features have integration tests (2041 total). You're credited as co-author on both commits. Closing since this is resolved on main — thanks for the contribution!

@DeusData DeusData closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants